Cost-Sensitive Self-Training
نویسندگان
چکیده
In some real-world applications, it is time-consuming or expensive to collect much labeled data, while unlabeled data is easier to obtain. Many semi-supervised learning methods have been proposed to deal with this problem by utilizing the unlabeled data. On the other hand, on some datasets, misclassifying different classes causes different costs, which challenges the common assumption in classification that classes have the same misclassification cost. For example, misclassifying a fraud as a legitimate transaction could be more serious than misclassifying a legitimate transaction as fraudulent. In this paper, we propose a cost-sensitive self-training method (CS-ST) to improve the performance of Naive Bayes when labeled instances are scarce and different misclassification errors are associated with different costs. CS-ST incorporates the misclassification costs into the learning process of self-training, and approximately estimates the misclassification error to help select unlabeled instances. Experiments on 13 UCI datasets and three text datasets show that, in terms of the total misclassification cost and the number of correctly classified instances with higher costs, CS-ST has better performance than the self-training method and the base classifier learned from the original labeled data only.
منابع مشابه
A New Formulation for Cost-Sensitive Two Group Support Vector Machine with Multiple Error Rate
Support vector machine (SVM) is a popular classification technique which classifies data using a max-margin separator hyperplane. The normal vector and bias of the mentioned hyperplane is determined by solving a quadratic model implies that SVM training confronts by an optimization problem. Among of the extensions of SVM, cost-sensitive scheme refers to a model with multiple costs which conside...
متن کاملCost-Aware Pre-Training for Multiclass Cost-Sensitive Deep Learning
Deep learning has been one of the most prominent machine learning techniques nowadays, being the state-of-the-art on a broad range of applications where automatic feature extraction is needed. Many such applications also demand varying costs for different types of mis-classification errors, but it is not clear whether or how such cost information can be incorporated into deep learning to improv...
متن کاملA Condensed Representation of Itemsets for Analyzing Their Evolution over Time
On Structured Output Training: Hard Cases and an Efficient Alternative p. 7 Spares Kernel SVMs via Cutting-Plane Training p. 8 Hybrid Least-Squares Algorithms for Approximate Policy Evaluation p. 9 A Self-training Approach to Cost Sensitive Uncertainty Sampling p. 10 Learning Multi-linear Representations of Distributions for Efficient Inference p. 11 Cost-Sensitive Learning Based on Bregman Div...
متن کاملAdaCost: Misclassification Cost-Sensitive Boosting
AdaCost, a variant of AdaBoost, is a misclassification cost-sensitive boosting method. It uses the cost of misclassifications to update the training distribution on successive boosting rounds. The purpose is to reduce the cumulative misclassification cost more than AdaBoost. We formally show that AdaCost reduces the upper bound of cumulative misclassification cost of the training set. Empirical...
متن کاملA cost-effectiveness analysis of self-debriefing versus instructor debriefing for simulated crises in perioperative medicine in Canada
PURPOSE High-fidelity simulation training is effective for learning crisis resource management (CRM) skills, but cost is a major barrier to implementing high-fidelity simulation training into the curriculum. The aim of this study was to examine the cost-effectiveness of self-debriefing and traditional instructor debriefing in CRM training programs and to calculate the minimum willingness-to-pay...
متن کامل